Methods and apparatus for data enhanced automated model generation

ABSTRACT

Methods, apparatus, systems, and articles of manufacture for data enhanced automated model generation are disclosed. Example instructions, when executed, cause at least one processor to access a request to generate a machine learning model to perform a selected task, generate task knowledge based on a previously generated machine learning model, create a search space based on the task knowledge, and generate a machine learning model using neural architecture search, the neural architecture search beginning based on the search space.

RELATED APPLICATIONS

This application claims priority to U.S. Provisional Patent ApplicationNo. 63/222,938, which was filed on Jul. 16, 2021. U.S. ProvisionalPatent Application No. 63/222,938 is hereby incorporated by reference inits entirety.

FIELD OF THE DISCLOSURE

This disclosure relates generally to machine learning and, moreparticularly, to methods and apparatus for data enhanced automated modelgeneration.

BACKGROUND

Machine learning is an important enabling technology for the revolutioncurrently underway in artificial intelligence, driving truly remarkableadvances in fields such as object detection, image classification,speech recognition, natural language processing, and many more. Modelsare created using machine learning that, when utilized, enable an outputto be generated based on an input. Neural architecture search enablesvarious architectures to be searched when creating a machine learningmodel.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an example system implemented in accordancewith the teachings of this disclosure for data enhanced automated modelgeneration.

FIG. 2 is a block diagram of an example process flow utilizing theexample system of FIG. 1.

FIG. 3 is a flowchart representative of example machine readableinstructions and/or example operations that may be executed by exampleprocessor circuitry to implement the example knowledge builder circuitryand the example model builder circuitry of FIG. 1.

FIG. 4 is a flowchart representative of example machine readableinstructions and/or example operations that may be executed by exampleprocessor circuitry to implement the example target hardware of FIG. 1.

FIG. 5 is a block diagram of an example processing platform includingprocessor circuitry structured to execute the example machine readableinstructions and/or the example operations of FIG. 3 to implement theexample knowledge builder circuitry and the example model buildercircuitry of FIG. 2.

FIG. 6 is a block diagram of an example implementation of the processorcircuitry of FIG. 4.

FIG. 7 is a block diagram of another example implementation of theprocessor circuitry of FIG. 4.

FIG. 8 is a block diagram of an example software distribution platform(e.g., one or more servers) to distribute software (e.g., softwarecorresponding to the example machine readable instructions of FIGS. 3and/or 4) to client devices associated with end users and/or consumers(e.g., for license, sale, and/or use), retailers (e.g., for sale,re-sale, license, and/or sub-license), and/or original equipmentmanufacturers (OEMs) (e.g., for inclusion in products to be distributedto, for example, retailers and/or to other end users such as direct buycustomers).

In general, the same reference numbers will be used throughout thedrawing(s) and accompanying written description to refer to the same orlike parts. The figures are not to scale.

As used in this patent, stating that any part (e.g., a layer, film,area, region, or plate) is in any way on (e.g., positioned on, locatedon, disposed on, or formed on, etc.) another part, indicates that thereferenced part is either in contact with the other part, or that thereferenced part is above the other part with one or more intermediatepart(s) located therebetween.

Unless specifically stated otherwise, descriptors such as “first,”“second,” “third,” etc., are used herein without imputing or otherwiseindicating any meaning of priority, physical order, arrangement in alist, and/or ordering in any way, but are merely used as labels and/orarbitrary names to distinguish elements for ease of understanding thedisclosed examples. In some examples, the descriptor “first” may be usedto refer to an element in the detailed description, while the sameelement may be referred to in a claim with a different descriptor suchas “second” or “third.” In such instances, it should be understood thatsuch descriptors are used merely for identifying those elementsdistinctly that might, for example, otherwise share a same name.

As used herein “substantially real time” refers to occurrence in a nearinstantaneous manner recognizing there may be real world delays forcomputing time, transmission, etc. Thus, unless otherwise specified,“substantially real time” refers to real time+/−1 second.

As used herein, the phrase “in communication,” including variationsthereof, encompasses direct communication and/or indirect communicationthrough one or more intermediary components, and does not require directphysical (e.g., wired) communication and/or constant communication, butrather additionally includes selective communication at periodicintervals, scheduled intervals, aperiodic intervals, and/or one-timeevents.

As used herein, “processor circuitry” is defined to include (i) one ormore special purpose electrical circuits structured to perform specificoperation(s) and including one or more semiconductor-based logic devices(e.g., electrical hardware implemented by one or more transistors),and/or (ii) one or more general purpose semiconductor-based electricalcircuits programmed with instructions to perform specific operations andincluding one or more semiconductor-based logic devices (e.g.,electrical hardware implemented by one or more transistors). Examples ofprocessor circuitry include programmed microprocessors, FieldProgrammable Gate Arrays (FPGAs) that may instantiate instructions,Central Processor Units (CPUs), Graphics Processor Units (GPUs), DigitalSignal Processors (DSPs), XPUs, or microcontrollers and integratedcircuits such as Application Specific Integrated Circuits (ASICs). Forexample, an XPU may be implemented by a heterogeneous computing systemincluding multiple types of processor circuitry (e.g., one or moreFPGAs, one or more CPUs, one or more GPUs, one or more DSPs, etc.,and/or a combination thereof) and application programming interface(s)(API(s)) that may assign computing task(s) to whichever one(s) of themultiple types of the processing circuitry is/are best suited to executethe computing task(s).

DETAILED DESCRIPTION

Neural Architecture Search (NAS) is an approach for exploring differentmachine learning algorithms for solving machine learning tasks. NASalgorithms take significant amount resources (e.g., compute resources,temporal resources, energy resources, etc.) to identify acceptablearchitectures. Most of these resources are expended by examiningnon-optimal architecture configurations during an exploration stage.Existing NAS algorithms do not provide clear explanations of thedecisions for selecting a particular architecture, and such algorithmsdo not benefit from collected data regarding previous findings (e.g.,sequence of operations, FLOPs, etc.) or target hardware capabilities.This information is typically discarded and does not benefit futureapplications of the NAS algorithm.

Due to the complexity of the task, NAS solutions tend to forget anyinsights from one run to the next. The initial conditions/configurationsin previous solutions are independent of any other configurations usedpreviously.

Existing NAS approaches do not reuse prior execution data related tomodels identified via NAS. That is, existing approaches do not benefitfrom collected knowledge about the task that the model will perform(e.g., detection, segmentation, etc.). When performing NAS, existingapproaches start from scratch every time, when looking for bettermodels. Many existing NAS approaches also require significantreconfiguration when moving to different tasks, and such approaches donot generalize the neural network architecture search process.

Example approaches disclosed herein analyze state-of-the-art andemerging workloads and collect historical information about the modelsincluding performance, sequence of operations, size, floating pointoperations per second (FLOPS), etc. for each operation.

In examples disclosed herein, a user provides a task (objectrecognition, segmentation, etc.) and objective (accuracy, latency, mix,etc.), and the NAS system selects starting hyperparameters/configurationinformation which include the best configuration for the task,objective, and, in some examples, the target hardware on which the modelis to be executed.

Collected execution and/or performance information provides insights andguides the initial conditions on the search for an architecture thatsatisfies the requirements. The system also collects target hardwareinformation, making the system hardware-aware and allowing the system torefine for the specific target hardware(s). For example, the system canavoid dilated 7×7 convolution kernels if kernel does not perform well(e.g., latency on the selected target hardware exceeds a thresholdamount of latency).

Example approaches disclosed herein provide the user with the generatedmodel and the reasoning behind the choices made when selectingoperations. The decisions are based on the collected historical data andthe task knowledge obtained from the knowledge builder (KB). Providingthe reasoning for decisions can result in insights for future HWimprovements (e.g., optimize specific kernels, memory BW, etc.)

FIG. 1 is a block diagram of an example system implemented in accordancewith the teachings of this disclosure for data enhanced automated modelgeneration. The example system 100 of FIG. 1 includes knowledge buildercircuitry 105 that receives a user input 110, and model buildercircuitry 115 that builds and provides a model to target hardware 120.

The example system of FIG. 1 presents an end-to-end solution thatreceives information from the user (objective, task, target HW),analyzes this information using a knowledge base and builds suggestionsfor the search space and initial configuration for the NAS approach. Theapproach is agnostic to the NAS approach to be used, enabling a user todecide on the state-of-the-art approach that will receive the suggestedconfiguration.

The example user input 110 includes information including, for example,an objective of a machine learning model, a task to be performed by themachine learning model, and, optionally, one or more characteristics ofa target hardware on which the machine learning model is to be executed.The task (object recognition, segmentation, etc.) will include inputlayer requirements, output layer requirements, and data requirements.The system of FIG. 1 is flexible enough that the user can provideinformation used to influence the model generation (e.g., by specifyingwhether the current task is similar to another task, and/or byspecifying additional layers (not yet in the knowledge base, orassociated with a different task) to include in the search space).

The knowledge builder circuitry 105 of FIG. 1 may be instantiated (e.g.,creating an instance of, bring into being for any length of time,materialize, implement, etc.) by processor circuitry such as a centralprocessing unit executing instructions. Additionally or alternatively,the knowledge builder circuitry 105 of FIG. 1 may be instantiated (e.g.,creating an instance of, bring into being for any length of time,materialize, implement, etc.) by an ASIC or an FPGA structured toperform operations corresponding to the instructions. It should beunderstood that some or all of the circuitry of FIG. 1 may, thus, beinstantiated at the same or different times (and/or by differenthardware circuitry). Some or all of the circuitry may be instantiated,for example, in one or more threads executing concurrently on hardwareand/or in series on hardware. Moreover, in some examples, some or all ofthe circuitry of FIG. 1 may be implemented by one or more virtualmachines and/or containers executing on the microprocessor.

The example knowledge builder circuitry 105 of the illustrated exampleof FIG. 1 includes request accessor circuitry 130, hardware dataorchestration circuitry 135, task data orchestration circuitry 140, anda knowledge datastore 145. The example knowledge builder circuitry 105archives information for models and hardware into the knowledgedatastore 145. If the hardware is not known in the knowledge datastore145, the user is able to cause the system to execute on the targethardware 120 to extract performance metrics. A report of suchperformance metrics is obtained and added to the knowledge datastore 145to build task knowledge. If the task is not in the knowledge datastore145, the task data orchestration circuitry 140 creates task knowledgefor the new tasks. FIG. 2 illustrates the process for creating orupdating the knowledge datastore 145.

In examples disclosed herein, the knowledge datastore 145 of theknowledge builder circuitry 105 can be pre-populated withstate-of-the-art (SOTA) or custom models and hardware configurations. Inaddition, the knowledge datastore 145 can be updated at any time basedon, for example, statistics collected by the target hardware 120. Inexamples disclosed herein, the knowledge datastore 145 separates themodels by tasks. To build the task knowledge, model information isretrieved from the knowledge datastore 145 the specific task andfeatures are extracted from the models. In cases of a new or customtask, similar tasks/models are retrieved based on the user input. Thesefeatures include, but are not limited to, the framework used to trainthe model, the HW specs and any information for mapping model(latencies, etc.) including HW telemetry, the performance objective,sequence of operations, number of FLOPs, dataset used, number of layers,etc. These features are then ranked by hardware features, objective,etc. The extracted and ranked features are then considered taskknowledge which is then archived in the knowledge datastore 145 forfuture use.

The example request accessor circuitry 130 of the illustrated example ofFIG. 1 receives a request for generation of a model to perform aselected task. In examples disclosed herein, the user input 110 receivedby the request accessor circuitry 130 includes information including,for example, an objective of a machine learning model, a task to beperformed by the machine learning model, and, in some examples, one ormore characteristics of a target hardware on which the machine learningmodel is to be executed. The request may be formatted as, for example, arequest received at a web server, a request formatted in a structureddata format (e.g., a JavaScript object notation (JSON) format, anextensible markup language (XML) format, etc.). The example requestaccessor circuitry 130 accesses hardware data orchestration informationvia the hardware data orchestration circuitry 135 and task dataorchestration information via the task data orchestration circuitry 140.The accessed information (if available) and the request are provided tothe search space management circuitry 160 of the model builder circuitry115.

In some examples, the apparatus includes means for accessing a request.For example, the means for accessing may be implemented by the requestaccessor circuitry 130. In some examples, the request accessor circuitry130 may be instantiated by processor circuitry such as the exampleprocessor circuitry 512 of FIG. 5. For instance, the request accessorcircuitry 130 may be instantiated by the example general purposeprocessor circuitry 600 of FIG. 6 executing machine executableinstructions such as that implemented by at least block 310 of FIG. 3.In some examples, the request accessor circuitry 130 may be instantiatedby hardware logic circuitry, which may be implemented by an ASIC or theFPGA circuitry 700 of FIG. 7 structured to perform operationscorresponding to the machine readable instructions. Additionally oralternatively, the request accessor circuitry 130 may be instantiated byany other combination of hardware, software, and/or firmware. Forexample, the request accessor circuitry 130 may be implemented by atleast one or more hardware circuits (e.g., processor circuitry, discreteand/or integrated analog and/or digital circuitry, an FPGA, anApplication Specific Integrated Circuit (ASIC), a comparator, anoperational-amplifier (op-amp), a logic circuit, etc.) structured toexecute some or all of the machine readable instructions and/or toperform some or all of the operations corresponding to the machinereadable instructions without executing software or firmware, but otherstructures are likewise appropriate.

The example hardware data orchestration circuitry 135 of the illustratedexample of FIG. 1 determines whether any prior knowledge is present inthe knowledge datastore 145 for the selected hardware (e.g., theselected hardware identified in a request accessed by the requestaccessor circuitry 130). If no prior knowledge is known for the selectedhardware, the example hardware data orchestration circuitry 135 adds anidentification of the selected hardware to the knowledge datastore 145.The identification of the hardware enables subsequent performancemetrics associated with the selected hardware to be stored in theknowledge datastore 145 in an organized fashion. In some examples, theidentification of the selected hardware may be omitted prior to modelcreation and may, instead, be performed when performance metrics areprovided to the knowledge datastore by the execution performancestatistic collection circuitry 185.

The example task data orchestration circuitry 140 of the illustratedexample of FIG. 1 determines whether any task information is availablefor the selected task. If no prior knowledge is available for theselected task, the example task data orchestration circuitry 140 adds anidentification of the selected task to the knowledge datastore 145. Theidentification of the selected task enables subsequent performancemetrics associated with the selected task to be stored in the knowledgedatastore 145 in an organized fashion. In some examples, theidentification of the selected task may be omitted prior to modelcreation and may, instead, be performed when performance metrics areprovided to the knowledge datastore by the execution performancestatistic collection circuitry 185.

In some examples, the apparatus includes means for generating taskknowledge. For example, the means for generating task knowledge may beimplemented by the example task data orchestration circuitry 140. Insome examples, the example task data orchestration circuitry 140 may beinstantiated by processor circuitry such as the example processorcircuitry 512 of FIG. 5. For instance, the example task dataorchestration circuitry 140 may be instantiated by the example generalpurpose processor circuitry 600 of FIG. 6 executing machine executableinstructions such as that implemented by at least blocks 320, 335, 325of FIG. 3. In some examples, the example task data orchestrationcircuitry 140 may be instantiated by hardware logic circuitry, which maybe implemented by an ASIC or the FPGA circuitry 700 of FIG. 7 structuredto perform operations corresponding to the machine readableinstructions. Additionally or alternatively, the example task dataorchestration circuitry 140 may be instantiated by any other combinationof hardware, software, and/or firmware. For example, the example taskdata orchestration circuitry 140 may be implemented by at least one ormore hardware circuits (e.g., processor circuitry, discrete and/orintegrated analog and/or digital circuitry, an FPGA, an ApplicationSpecific Integrated Circuit (ASIC), a comparator, anoperational-amplifier (op-amp), a logic circuit, etc.) structured toexecute some or all of the machine readable instructions and/or toperform some or all of the operations corresponding to the machinereadable instructions without executing software or firmware, but otherstructures are likewise appropriate.

The example knowledge datastore 145 of the illustrated example of FIG. 1is implemented by any memory, storage device and/or storage disc forstoring data such as, for example, flash memory, magnetic media, opticalmedia, solid state memory, hard drive(s), thumb drive(s), etc.Furthermore, the data stored in the example knowledge datastore 145 maybe in any data format such as, for example, binary data, comma delimiteddata, tab delimited data, structured query language (SQL) structures,etc. While, in the illustrated example, the knowledge datastore 145 isillustrated as a single device, the example knowledge datastore 145and/or any other data storage devices described herein may beimplemented by any number and/or type(s) of memories. In the illustratedexample of FIG. 1, the example knowledge datastore 145 stores hardwareand/or task knowledge.

The model builder circuitry 115 of FIG. 1 may be instantiated (e.g.,creating an instance of, bring into being for any length of time,materialize, implement, etc.) by processor circuitry such as a centralprocessing unit executing instructions. Additionally or alternatively,the model builder circuitry 115 of FIG. 1 may be instantiated (e.g.,creating an instance of, bring into being for any length of time,materialize, implement, etc.) by an ASIC or an FPGA structured toperform operations corresponding to the instructions. As noted above, itshould be understood that some or all of the circuitry of FIG. 1 may,thus, be instantiated at the same or different times (and/or bydifferent hardware circuitry). Some or all of the circuitry may beinstantiated, for example, in one or more threads executing concurrentlyon hardware and/or in series on hardware. Moreover, in some examples,some or all of the circuitry of FIG. 1 may be implemented by one or morevirtual machines and/or containers executing on the microprocessor.

The example model builder circuitry 115 of the illustrated example ofFIG. 1 includes search space management circuitry 160, anchor pointinserter circuitry 165, neural architecture search circuitry 170, andmodel outputter circuitry 175. The model builder circuitry 115 isresponsible for extracting the insights in the knowledge datastore andexecuting neural architecture search to identify an optimal model.First, the example search space management circuitry 160 creates asearch space. This search space includes the operations provided by thetask knowledge from the knowledge datastore, variants of thoseoperations, and additional layers if the user specifies. The neuralarchitecture search circuitry 170 performs a search that is initiatedwith the configuration identified by the search space managementcircuitry 160 for the objective, task, HW, etc. Anchor points areinserted in the chosen NAS algorithm by the anchor point insertercircuitry 165 to capture the decisions made during this process. Thetask knowledge is incorporated in the training loop of the neuralarchitecture search circuitry 170 to inform decisions and guide thesearch. During training, historical decisions, confidence levels, andthe knowledge datastore-based recommendations obtained from the taskknowledge are used to guide the neural architecture search.

In some examples, the apparatus includes means for creating a searchspace. For example, the means for creating may be implemented by theexample search space management circuitry 160. In some examples, theexample search space management circuitry 160 may be instantiated byprocessor circuitry such as the example processor circuitry 512 of FIG.5. For instance, the example search space management circuitry 160 maybe instantiated by the example general purpose processor circuitry 500of FIG. 5 executing machine executable instructions such as thatimplemented by at least blocks 327, 340 of FIG. 3. In some examples, theexample search space management circuitry 160 may be instantiated byhardware logic circuitry, which may be implemented by an ASIC or theFPGA circuitry 700 of FIG. 7 structured to perform operationscorresponding to the machine readable instructions. Additionally oralternatively, the example search space management circuitry 160 may beinstantiated by any other combination of hardware, software, and/orfirmware. For example, the example search space management circuitry 160may be implemented by at least one or more hardware circuits (e.g.,processor circuitry, discrete and/or integrated analog and/or digitalcircuitry, an FPGA, an Application Specific Integrated Circuit (ASIC), acomparator, an operational-amplifier (op-amp), a logic circuit, etc.)structured to execute some or all of the machine readable instructionsand/or to perform some or all of the operations corresponding to themachine readable instructions without executing software or firmware,but other structures are likewise appropriate.

In some examples, the apparatus includes means for generating a machinelearning model. For example, the means for generating may be implementedby the example neural architecture search circuitry 170. In someexamples, the example neural architecture search circuitry 170 may beinstantiated by processor circuitry such as the example processorcircuitry 512 of FIG. 5. For instance, the example neural architecturesearch circuitry 170 may be instantiated by the example general purposeprocessor circuitry 600 of FIG. 6 executing machine executableinstructions such as that implemented by at least blocks 330, 350 ofFIG. 3. In some examples, the example neural architecture searchcircuitry 170 may be instantiated by hardware logic circuitry, which maybe implemented by an ASIC or the FPGA circuitry 700 of FIG. 7 structuredto perform operations corresponding to the machine readableinstructions. Additionally or alternatively, the example neuralarchitecture search circuitry 170 may be instantiated by any othercombination of hardware, software, and/or firmware. For example, theexample neural architecture search circuitry 170 may be implemented byat least one or more hardware circuits (e.g., processor circuitry,discrete and/or integrated analog and/or digital circuitry, an FPGA, anApplication Specific Integrated Circuit (ASIC), a comparator, anoperational-amplifier (op-amp), a logic circuit, etc.) structured toexecute some or all of the machine readable instructions and/or toperform some or all of the operations corresponding to the machinereadable instructions without executing software or firmware, but otherstructures are likewise appropriate.

In some examples, the apparatus includes means for inserting. Forexample, the means for inserting may be implemented by the exampleanchor point inserter circuitry 165. In some examples, the exampleanchor point inserter circuitry 165 may be instantiated by processorcircuitry such as the example processor circuitry 512 of FIG. 5. Forinstance, the example anchor point inserter circuitry 165 may beinstantiated by the example general purpose processor circuitry 600 ofFIG. 6 executing machine executable instructions such as thatimplemented by at least block 360 of FIG. 3. In some examples, theexample anchor point inserter circuitry 165 may be instantiated byhardware logic circuitry, which may be implemented by an ASIC or theFPGA circuitry 700 of FIG. 7 structured to perform operationscorresponding to the machine readable instructions. Additionally oralternatively, the example anchor point inserter circuitry 165 may beinstantiated by any other combination of hardware, software, and/orfirmware. For example, the example anchor point inserter circuitry 165may be implemented by at least one or more hardware circuits (e.g.,processor circuitry, discrete and/or integrated analog and/or digitalcircuitry, an FPGA, an Application Specific Integrated Circuit (ASIC), acomparator, an operational-amplifier (op-amp), a logic circuit, etc.)structured to execute some or all of the machine readable instructionsand/or to perform some or all of the operations corresponding to themachine readable instructions without executing software or firmware,but other structures are likewise appropriate.

After generation of the model, the example model outputter circuitry 175provides a model for execution. In some examples, the decisions and/orrationales selected during the neural architecture search are madeavailable in association with the generated model.

The target hardware 120 of FIG. 1 may be instantiated (e.g., creating aninstance of, bring into being for any length of time, materialize,implement, etc.) by processor circuitry such as a central processingunit executing instructions. Additionally or alternatively, the targethardware 120 of FIG. 1 may be instantiated (e.g., creating an instanceof, bring into being for any length of time, materialize, implement,etc.) by an ASIC or an FPGA structured to perform operationscorresponding to the instructions. As noted above, it should beunderstood that some or all of the circuitry of FIG. 1 may, thus, beinstantiated at the same or different times (and/or by differenthardware circuitry). Some or all of the circuitry may be instantiated,for example, in one or more threads executing concurrently on hardwareand/or in series on hardware. Moreover, in some examples, some or all ofthe circuitry of FIG. 1 may be implemented by one or more virtualmachines and/or containers executing on the microprocessor.

The example target hardware 120 of the illustrated example of FIG. 1includes model execution circuitry 180 and execution performancestatistic collection circuitry 185. The example model executioncircuitry 180 of the illustrated example of FIG. 1 executes the modelprovided by the model outputter circuitry 175.

The example execution performance statistic collection circuitry 185 ofthe illustrated example of FIG. 1, during execution of the model by themodel execution circuitry 180, collects model execution statistics usingthe inserted anchor points. The collected execution statistics areprovided to the knowledge datastore 145. In examples disclosed herein,the collected execution statistics include information about the anchorpoints. Including information about the anchor points enables statisticsspecific to particular features to be utilized when generating taskknowledge.

FIG. 2 is a block diagram of an example process flow utilizing theexample system of FIG. 1. The example process begins when a user submitsa request for generation of a model to perform a selected task. (Blocks210). The requested model is generated using neural architecture searchand prior knowledge of models associated with the selected task. (Block220). The generated models are provided to the target hardware forexecution and collection of performance statistics. (Blocks 230).Execution features are extracted from the models. (Block 240). Theextracted features are ranked based on collected performance metrics.(Block 250). The extracted features and their associated performancemetrics are added to the knowledge datastore 145. (Block 260). Thisadded knowledge may then subsequently be used for future generation ofmodels. (Block 220).

While an example manner of implementing the example knowledge buildercircuitry 105 and/or the example model builder circuitry 115 isillustrated in FIG. 1, one or more of the elements, processes, and/ordevices illustrated in FIG. 1 may be combined, divided, re-arranged,omitted, eliminated, and/or implemented in any other way. Further, theexample request accessor circuitry 130, the example hardware dataorchestration circuitry 135, the example task data orchestrationcircuitry 140, and/or more, generally, example knowledge buildercircuitry 105 of FIG. 1, and/or the example search space managementcircuitry 160, the example anchor point inserter circuitry 165, theexample neural architecture search circuitry 170, the example modeloutputter circuitry 175, and/or, more generally, the example modelbuilder circuitry 115 of FIG. 1, may be implemented by hardware alone orby hardware in combination with software and/or firmware. Thus, forexample, any of the example request accessor circuitry 130, the examplehardware data orchestration circuitry 135, the example task dataorchestration circuitry 140, and/or more, generally, example knowledgebuilder circuitry 105 of FIG. 1, and/or the example search spacemanagement circuitry 160, the example anchor point inserter circuitry165, the example neural architecture search circuitry 170, the examplemodel outputter circuitry 175, and/or, more generally, the example modelbuilder circuitry 115 of FIG. 1, could be implemented by processorcircuitry, analog circuit(s), digital circuit(s), logic circuit(s),programmable processor(s), programmable microcontroller(s), graphicsprocessing unit(s) (GPU(s)), digital signal processor(s) (DSP(s)),application specific integrated circuit(s) (ASIC(s)), programmable logicdevice(s) (PLD(s)), and/or field programmable logic device(s) (FPLD(s))such as Field Programmable Gate Arrays (FPGAs). Further still, theexample request accessor circuitry 130, the example hardware dataorchestration circuitry 135, the example task data orchestrationcircuitry 140, and/or more, generally, example knowledge buildercircuitry 105 of FIG. 1, and/or the example search space managementcircuitry 160, the example anchor point inserter circuitry 165, theexample neural architecture search circuitry 170, the example modeloutputter circuitry 175, and/or, more generally, the example modelbuilder circuitry 115 of FIG. 1 may include one or more elements,processes, and/or devices in addition to, or instead of, thoseillustrated in FIG. 1, and/or may include more than one of any or all ofthe illustrated elements, processes and devices.

A flowchart representative of example hardware logic circuitry, machinereadable instructions, hardware implemented state machines, and/or anycombination thereof for implementing the knowledge builder circuitry 105and/or the example model builder circuitry 115 of FIG. 1 is shown inFIG. 3. The machine readable instructions may be one or more executableprograms or portion(s) of an executable program for execution byprocessor circuitry, such as the processor circuitry 512 shown in theexample processor platform 500 discussed below in connection with FIG. 5and/or the example processor circuitry discussed below in connectionwith FIGS. 5 and/or 6.

A flowchart representative of example hardware logic circuitry, machinereadable instructions, hardware implemented state machines, and/or anycombination thereof for implementing the target hardware 120 of FIG. 1is shown in FIG. 4. The machine readable instructions may be one or moreexecutable programs or portion(s) of an executable program for executionby processor circuitry, such as the processor circuitry 512 shown in theexample processor platform 500 discussed below in connection with FIG. 5and/or the example processor circuitry discussed below in connectionwith FIGS. 5 and/or 6.

The programs of FIGS. 3 and/or 4 may be embodied in software stored onone or more non-transitory computer readable storage media such as acompact disk (CD), a floppy disk, a hard disk drive (HDD), a solid-statedrive (SSD), a digital versatile disk (DVD), a Blu-ray disk, a volatilememory (e.g., Random Access Memory (RAM) of any type, etc.), or anon-volatile memory (e.g., electrically erasable programmable read-onlymemory (EEPROM), FLASH memory, an HDD, an SSD, etc.) associated withprocessor circuitry located in one or more hardware devices, but theentire program and/or parts thereof could alternatively be executed byone or more hardware devices other than the processor circuitry and/orembodied in firmware or dedicated hardware. The machine readableinstructions may be distributed across multiple hardware devices and/orexecuted by two or more hardware devices (e.g., a server and a clienthardware device). For example, the client hardware device may beimplemented by an endpoint client hardware device (e.g., a hardwaredevice associated with a user) or an intermediate client hardware device(e.g., a radio access network (RAN)) gateway that may facilitatecommunication between a server and an endpoint client hardware device).Similarly, the non-transitory computer readable storage media mayinclude one or more mediums located in one or more hardware devices.Further, although the example program is described with reference to theflowchart illustrated in FIG. 3, many other methods of implementing theexample knowledge builder circuitry 105 and/or the example model buildercircuitry 115 may alternatively be used. For example, the order ofexecution of the blocks may be changed, and/or some of the blocksdescribed may be changed, eliminated, or combined. Additionally oralternatively, any or all of the blocks may be implemented by one ormore hardware circuits (e.g., processor circuitry, discrete and/orintegrated analog and/or digital circuitry, an FPGA, an ASIC, acomparator, an operational-amplifier (op-amp), a logic circuit, etc.)structured to perform the corresponding operation without executingsoftware or firmware. The processor circuitry may be distributed indifferent network locations and/or local to one or more hardware devices(e.g., a single-core processor (e.g., a single core central processorunit (CPU)), a multi-core processor (e.g., a multi-core CPU), etc.) in asingle machine, multiple processors distributed across multiple serversof a server rack, multiple processors distributed across one or moreserver racks, a CPU and/or a FPGA located in the same package (e.g., thesame integrated circuit (IC) package or in two or more separatehousings, etc.).

The machine readable instructions described herein may be stored in oneor more of a compressed format, an encrypted format, a fragmentedformat, a compiled format, an executable format, a packaged format, etc.Machine readable instructions as described herein may be stored as dataor a data structure (e.g., as portions of instructions, code,representations of code, etc.) that may be utilized to create,manufacture, and/or produce machine executable instructions. Forexample, the machine readable instructions may be fragmented and storedon one or more storage devices and/or computing devices (e.g., servers)located at the same or different locations of a network or collection ofnetworks (e.g., in the cloud, in edge devices, etc.). The machinereadable instructions may require one or more of installation,modification, adaptation, updating, combining, supplementing,configuring, decryption, decompression, unpacking, distribution,reassignment, compilation, etc., in order to make them directlyreadable, interpretable, and/or executable by a computing device and/orother machine. For example, the machine readable instructions may bestored in multiple parts, which are individually compressed, encrypted,and/or stored on separate computing devices, wherein the parts whendecrypted, decompressed, and/or combined form a set of machineexecutable instructions that implement one or more operations that maytogether form a program such as that described herein.

In another example, the machine readable instructions may be stored in astate in which they may be read by processor circuitry, but requireaddition of a library (e.g., a dynamic link library (DLL)), a softwaredevelopment kit (SDK), an application programming interface (API), etc.,in order to execute the machine readable instructions on a particularcomputing device or other device. In another example, the machinereadable instructions may need to be configured (e.g., settings stored,data input, network addresses recorded, etc.) before the machinereadable instructions and/or the corresponding program(s) can beexecuted in whole or in part. Thus, machine readable media, as usedherein, may include machine readable instructions and/or program(s)regardless of the particular format or state of the machine readableinstructions and/or program(s) when stored or otherwise at rest or intransit.

The machine readable instructions described herein can be represented byany past, present, or future instruction language, scripting language,programming language, etc. For example, the machine readableinstructions may be represented using any of the following languages: C,C++, Java, C#, Perl, Python, JavaScript, HyperText Markup Language(HTML), Structured Query Language (SQL), Swift, etc.

As mentioned above, the example operations of FIGS. 3 and/or 4 may beimplemented using executable instructions (e.g., computer and/or machinereadable instructions) stored on one or more non-transitory computerand/or machine readable media such as optical storage devices, magneticstorage devices, an HDD, a flash memory, a read-only memory (ROM), a CD,a DVD, a cache, a RAM of any type, a register, and/or any other storagedevice or storage disk in which information is stored for any duration(e.g., for extended time periods, permanently, for brief instances, fortemporarily buffering, and/or for caching of the information). As usedherein, the terms non-transitory computer readable medium andnon-transitory computer readable storage medium are expressly defined toinclude any type of computer readable storage device and/or storage diskand to exclude propagating signals and to exclude transmission media.

“Including” and “comprising” (and all forms and tenses thereof) are usedherein to be open ended terms. Thus, whenever a claim employs any formof “include” or “comprise” (e.g., comprises, includes, comprising,including, having, etc.) as a preamble or within a claim recitation ofany kind, it is to be understood that additional elements, terms, etc.,may be present without falling outside the scope of the correspondingclaim or recitation. As used herein, when the phrase “at least” is usedas the transition term in, for example, a preamble of a claim, it isopen-ended in the same manner as the term “comprising” and “including”are open ended. The term “and/or” when used, for example, in a form suchas A, B, and/or C refers to any combination or subset of A, B, C such as(1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) Bwith C, or (7) A with B and with C. As used herein in the context ofdescribing structures, components, items, objects and/or things, thephrase “at least one of A and B” is intended to refer to implementationsincluding any of (1) at least one A, (2) at least one B, or (3) at leastone A and at least one B. Similarly, as used herein in the context ofdescribing structures, components, items, objects and/or things, thephrase “at least one of A or B” is intended to refer to implementationsincluding any of (1) at least one A, (2) at least one B, or (3) at leastone A and at least one B. As used herein in the context of describingthe performance or execution of processes, instructions, actions,activities and/or steps, the phrase “at least one of A and B” isintended to refer to implementations including any of (1) at least oneA, (2) at least one B, or (3) at least one A and at least one B.Similarly, as used herein in the context of describing the performanceor execution of processes, instructions, actions, activities and/orsteps, the phrase “at least one of A or B” is intended to refer toimplementations including any of (1) at least one A, (2) at least one B,or (3) at least one A and at least one B.

As used herein, singular references (e.g., “a”, “an”, “first”, “second”,etc.) do not exclude a plurality. The term “a” or “an” object, as usedherein, refers to one or more of that object. The terms “a” (or “an”),“one or more”, and “at least one” are used interchangeably herein.Furthermore, although individually listed, a plurality of means,elements or method actions may be implemented by, e.g., the same entityor object. Additionally, although individual features may be included indifferent examples or claims, these may possibly be combined, and theinclusion in different examples or claims does not imply that acombination of features is not feasible and/or advantageous.

FIG. 3 is a flowchart representative of example machine readableinstructions and/or example operations 300 that may be executed and/orinstantiated by processor circuitry to implement the example knowledgebuilder circuitry and the example model builder circuitry of FIG. 1. Themachine readable instructions and/or the operations 300 of FIG. 3 beginat block 310, at which the request accessor circuitry 130 receives arequest for generation of a model to perform a selected task. (Block310). In examples disclosed herein, the user input 110 received by therequest accessor circuitry 130 includes information including, forexample, an objective of a machine learning model, a task to beperformed by the machine learning model, and, in some examples, one ormore characteristics of a target hardware on which the machine learningmodel is to be executed. The request may be formatted as, for example, arequest received at a web server, a request formatted in a structureddata format (e.g., a JavaScript object notation (JSON) format, anextensible markup language (XML) format, etc.). The example requestaccessor circuitry 130 accesses hardware data orchestration informationvia the hardware data orchestration circuitry 135 and task dataorchestration information via the task data orchestration circuitry 140.The accessed information (if available) and the request are provided tothe search space management circuitry 160 of the model builder circuitry115.

The example hardware data orchestration circuitry 135 determines whetherany prior knowledge is present in the knowledge datastore 145 for theselected hardware. (Block 312). If no prior knowledge is known for theselected hardware (e.g., block 312 returns a result of NO), the examplehardware data orchestration circuitry 135 adds an identification of theselected hardware to the knowledge datastore 145. (Block 314). Theidentification of the hardware enables subsequent performance metricsassociated with the selected hardware to be stored in the knowledgedatastore 145 in an organized fashion. In some examples, theidentification of the selected hardware may be omitted prior to modelcreation and may, instead, be performed when performance metrics areprovided to the knowledge datastore by the execution performancestatistic collection circuitry 185.

The example task data orchestration circuitry 140 determines whether anytask information is available for the selected task. (Block 320). If noprior knowledge is available for the selected task (e.g., block 320returns a result of NO), the example task data orchestration circuitry140 adds an identification of the selected task to the knowledgedatastore 145. (Block 325). The identification of the selected taskenables subsequent performance metrics associated with the selected taskto be stored in the knowledge datastore 145 in an organized fashion. Insome examples, the identification of the selected task may be omittedprior to model creation and may, instead, be performed when performancemetrics are provided to the knowledge datastore by the executionperformance statistic collection circuitry 185. The example search spacemanagement circuitry 160 creates a search space based on user selectionof available building blocks or building blocks from existingstate-of-the-art architecture(s) for the task. (Block 327). In thismanner, the search space is created, but is not based on specific priortask knowledge (as is described in connection with block 340, below). Insome examples, the ability to perform user selection of availablebuilding blocks (and/or whether to use state-of-the-art architecture(s)for the task) may be configurable by policy.

The example NAS search circuitry 170 performs neural architecture searchto generate a model using the search space. (Block 330). In theillustrated example of FIG. 3, the NAS search circuitry 170 starts froman uninitialized state. That is, no prior knowledge of performance ofvarious tasks and/or hardware on which the tasks are to be executed isused when performing the neural architecture search of block 330.

Returning to block 320, if the task data orchestration circuitry 140determines that prior knowledge is present for the selected task (e.g.,block 320 returns a result of YES), the example task data orchestrationcircuitry 140 builds task knowledge. (Block 335). To build the taskknowledge, model information is retrieved by the task data orchestrationcircuitry 140 from the knowledge datastore 145 for the specific task andfeatures are extracted from the models. In cases of a new or customtask, similar tasks/models are retrieved based on the user input. Thesefeatures include, but are not limited to, the framework used to trainthe model, the hardware specification and/or any information for mappingmodel (latencies, etc.) including hardware telemetry, the performanceobjective, sequence of operations, number of FLOPs, dataset used, numberof layers, etc. These features are then ranked by hardware, objective,etc. The respective features extracted and ranked from the model(s) iscollectively identified as the task knowledge which is then used tocreate the search space. In some examples, such task knowledge isarchived in the knowledge datastore 145 to allow for efficient retrievalshould a same task be later requested.

The example search space management circuitry 160 creates a search spacefrom the prior task knowledge. (Block 340). The search space may becreated by, for example, ranking and selecting a prior architecture thathad an acceptable level of performance on the target hardware (and/orhardware similar to the target hardware). In some examples, performancestatistics stored in the knowledge datastore 145 associated withdifferent architectures and tasks are compared to select an architecturemeeting a threshold performance statistic. In some examples, theperformance statistic upon which the selection is based may be dependentupon the user input 110 which may indicate, for example, whether powerconsumption statistics are to be prioritized over processing speedstatistics.

In some examples, the selection of the prioritization (e.g.,prioritization of functionality, performance, power optimization, etc.)may be guided by a policy. For example, a policy may be provided by apolicy-providing entity to control behavior of the training operationsand/or search space management. In some examples, the policy controlsother details about the creation and/or training of the model including,for example, different levels of neural network sparsity (e.g., 50%,90%, etc.), different levels of precision (e.g., thirty-two bit floatingpoint values, sixteen-bit floating point values, eight bit integervalues, etc.)

In some examples, the policy-providing entity may be a user of thesystem of FIG. 1. However, the policy-providing entity may be any otherentity that guides functionality of the system of FIG. 1 including, forexample, a system administrator, a manufacturer, a device provider, etc.In some examples, the policy-providing entity may be separate from theuser. In this manner, the user is able to input requests for trainingand/or creation of a machine learning model, while allowing theparameters under which the training and/or creation of the machinelearning model to be based on the policy created by the policy-providingentity.

In some examples the policy is provisioned to the system of FIG. 1 bythe policy-providing entity via a platform Trusted Execution Environment(TEE). However, the policy may be provided to the system of FIG. 1 inany other manner.

The example NAS search circuitry 170 generates a model using neuralarchitecture search, based on the search space created by the searchspace management circuitry 160. (Block 350). In this manner, the neuralarchitecture search performed by the NAS search circuitry 170 at block350 starts from an initialized state based on the prior task knowledge(e.g., starting from an architecture which previously met a performancethreshold).

The example anchor point inserter circuitry 165 then inserts anchorpoints into the generated model. (Block 360). Anchor points providelocations at which performance statistics are to be measured by theexecution performance statistic collection circuitry 185. Moreover, theanchor points provide locations by which additional information aboutthe model and/or the objectives/tasks of the model may be captured. Inexamples disclosed herein, anchor points are inserted intermediaterespective layers of the generated model. In some examples, anchorpoints are added to the model prior to the first layer and after thelast layer of the model. In some other examples, anchor points are addedadjacent (e.g., before and after) particular types of layers (e.g., aconvolution layer).

The example model outputter circuitry 175 provides the generated modelto the target hardware 120 for execution by the model executioncircuitry 180. (Block 370). In examples disclosed herein, the model mayfirst be stored at a storage location (e.g., a server) before beingprovided to the model execution circuitry 180. In some examples, themodel execution circuitry 180 may retrieve the model from the storagelocation or directly from the model outputter circuitry 175. The processof the illustrated example of FIG. 3 then terminates, but by may bere-executed upon, for example, receipt of subsequent user input 110.

FIG. 4 is a flowchart representative of example machine readableinstructions and/or example operations 400 that may be executed and/orinstantiated by processor circuitry to implement the example targethardware 120 of FIG. 1. The machine readable instructions and/or theoperations 400 of FIG. 4 begin at block 410, at which the modelexecution circuitry 180 begin execution of a model received from themodel outputter circuitry 175. (Block 410). During execution of themodel, the example execution performance statistic collection circuitry185 collects model execution statistics using the inserted anchorpoints. (Block 420). The collected execution statistics are provided tothe knowledge datastore 145. (Block 430). In examples disclosed herein,the collected execution statistics include information about the anchorpoints. Including information about the anchor points enables statisticsspecific to particular features to be utilized when generating taskknowledge.

FIG. 5 is a block diagram of an example processor platform 500structured to execute and/or instantiate the machine readableinstructions and/or the operations of FIGS. 3 and/or 4 to implement thesystem 100 of FIG. 1. The processor platform 500 can be, for example, aserver, a personal computer, a workstation, a self-learning machine(e.g., a neural network), a mobile device (e.g., a cell phone, a smartphone, a tablet such as an iPad™), a personal digital assistant (PDA),an Internet appliance, a DVD player, a CD player, a digital videorecorder, a Blu-ray player, a gaming console, a personal video recorder,a set top box, a headset (e.g., an augmented reality (AR) headset, avirtual reality (VR) headset, etc.) or other wearable device, or anyother type of computing device.

The processor platform 500 of the illustrated example includes processorcircuitry 512. The processor circuitry 512 of the illustrated example ishardware. For example, the processor circuitry 512 can be implemented byone or more integrated circuits, logic circuits, FPGAs, microprocessors,CPUs, GPUs, DSPs, and/or microcontrollers from any desired family ormanufacturer. The processor circuitry 512 may be implemented by one ormore semiconductor based (e.g., silicon based) devices. In this example,the processor circuitry 512 implements the knowledge builder circuitry105 and the model builder circuitry 115. In some examples, the knowledgebuilder circuitry 105 and the model builder circuitry 115 may beimplemented on separate processor platforms.

The processor circuitry 512 of the illustrated example includes a localmemory 513 (e.g., a cache, registers, etc.). The processor circuitry 512of the illustrated example is in communication with a main memoryincluding a volatile memory 514 and a non-volatile memory 516 by a bus518. The volatile memory 514 may be implemented by Synchronous DynamicRandom Access Memory (SDRAM), Dynamic Random Access Memory (DRAM),RAMBUS® Dynamic Random Access Memory (RDRAM®), and/or any other type ofRAM device. The non-volatile memory 516 may be implemented by flashmemory and/or any other desired type of memory device. Access to themain memory 514, 516 of the illustrated example is controlled by amemory controller 517.

The processor platform 500 of the illustrated example also includesinterface circuitry 520. The interface circuitry 520 may be implementedby hardware in accordance with any type of interface standard, such asan Ethernet interface, a universal serial bus (USB) interface, aBluetooth® interface, a near field communication (NFC) interface, aPeripheral Component Interconnect (PCI) interface, and/or a PeripheralComponent Interconnect Express (PCIe) interface.

In the illustrated example, one or more input devices 522 are connectedto the interface circuitry 520. The input device(s) 522 permit(s) a userto enter data and/or commands into the processor circuitry 512. Theinput device(s) 522 can be implemented by, for example, an audio sensor,a microphone, a camera (still or video), a keyboard, a button, a mouse,a touchscreen, a track-pad, a trackball, an isopoint device, and/or avoice recognition system.

One or more output devices 524 are also connected to the interfacecircuitry 520 of the illustrated example. The output device(s) 524 canbe implemented, for example, by display devices (e.g., a light emittingdiode (LED), an organic light emitting diode (OLED), a liquid crystaldisplay (LCD), a cathode ray tube (CRT) display, an in-place switching(IPS) display, a touchscreen, etc.), a tactile output device, a printer,and/or speaker. The interface circuitry 520 of the illustrated example,thus, typically includes a graphics driver card, a graphics driver chip,and/or graphics processor circuitry such as a GPU.

The interface circuitry 520 of the illustrated example also includes acommunication device such as a transmitter, a receiver, a transceiver, amodem, a residential gateway, a wireless access point, and/or a networkinterface to facilitate exchange of data with external machines (e.g.,computing devices of any kind) by a network 526. The communication canbe by, for example, an Ethernet connection, a digital subscriber line(DSL) connection, a telephone line connection, a coaxial cable system, asatellite system, a line-of-site wireless system, a cellular telephonesystem, an optical connection, etc.

The processor platform 500 of the illustrated example also includes oneor more mass storage devices 528 to store software and/or data. Examplesof such mass storage devices 528 include magnetic storage devices,optical storage devices, floppy disk drives, HDDs, CDs, Blu-ray diskdrives, redundant array of independent disks (RAID) systems, solid statestorage devices such as flash memory devices and/or SSDs, and DVDdrives.

The machine executable instructions 532, which may be implemented by themachine readable instructions of FIGS. 3 and/or 4, may be stored in themass storage device 528, in the volatile memory 514, in the non-volatilememory 516, and/or on a removable non-transitory computer readablestorage medium such as a CD or DVD.

FIG. 5 is a block diagram of an example implementation of the processorcircuitry 512 of FIG. 5. In this example, the processor circuitry 512 ofFIG. 5 is implemented by a general purpose microprocessor 600. Thegeneral purpose microprocessor circuitry 600 executes some or all of themachine readable instructions of the flowchart of FIG. 3 to effectivelyinstantiate the knowledge builder circuitry 105 and/or the example modelbuilder circuitry 115 of FIG. 1 as logic circuits to perform theoperations corresponding to those machine readable instructions. In somesuch examples, the circuitry of FIG. 1 is instantiated by the hardwarecircuits of the microprocessor 600 in combination with the instructions.For example, the microprocessor 600 may implement multi-core hardwarecircuitry such as a CPU, a DSP, a GPU, an XPU, etc. Although it mayinclude any number of example cores 602 (e.g., 1 core), themicroprocessor 600 of this example is a multi-core semiconductor deviceincluding N cores. The cores 602 of the microprocessor 600 may operateindependently or may cooperate to execute machine readable instructions.For example, machine code corresponding to a firmware program, anembedded software program, or a software program may be executed by oneof the cores 602 or may be executed by multiple ones of the cores 602 atthe same or different times. In some examples, the machine codecorresponding to the firmware program, the embedded software program, orthe software program is split into threads and executed in parallel bytwo or more of the cores 602. The software program may correspond to aportion or all of the machine readable instructions and/or operationsrepresented by the flowchart of FIG. 3.

The cores 602 may communicate by a first example bus 604. In someexamples, the first bus 604 may implement a communication bus toeffectuate communication associated with one(s) of the cores 602. Forexample, the first bus 604 may implement at least one of anInter-Integrated Circuit (I2C) bus, a Serial Peripheral Interface (SPI)bus, a PCI bus, or a PCIe bus. Additionally or alternatively, the firstbus 604 may implement any other type of computing or electrical bus. Thecores 602 may obtain data, instructions, and/or signals from one or moreexternal devices by example interface circuitry 606. The cores 602 mayoutput data, instructions, and/or signals to the one or more externaldevices by the interface circuitry 606. Although the cores 602 of thisexample include example local memory 620 (e.g., Level 1 (L1) cache thatmay be split into an L1 data cache and an L1 instruction cache), themicroprocessor 600 also includes example shared memory 610 that may beshared by the cores (e.g., Level 2 (L2_cache)) for high-speed access todata and/or instructions. Data and/or instructions may be transferred(e.g., shared) by writing to and/or reading from the shared memory 610.The local memory 620 of each of the cores 602 and the shared memory 610may be part of a hierarchy of storage devices including multiple levelsof cache memory and the main memory (e.g., the main memory 514, 516 ofFIG. 5). Typically, higher levels of memory in the hierarchy exhibitlower access time and have smaller storage capacity than lower levels ofmemory. Changes in the various levels of the cache hierarchy are managed(e.g., coordinated) by a cache coherency policy.

Each core 602 may be referred to as a CPU, DSP, GPU, etc., or any othertype of hardware circuitry. Each core 602 includes control unitcircuitry 614, arithmetic and logic (AL) circuitry (sometimes referredto as an ALU) 616, a plurality of registers 618, the L1 cache 620, and asecond example bus 622. Other structures may be present. For example,each core 602 may include vector unit circuitry, single instructionmultiple data (SIMD) unit circuitry, load/store unit (LSU) circuitry,branch/jump unit circuitry, floating-point unit (FPU) circuitry, etc.The control unit circuitry 614 includes semiconductor-based circuitsstructured to control (e.g., coordinate) data movement within thecorresponding core 602. The AL circuitry 616 includessemiconductor-based circuits structured to perform one or moremathematic and/or logic operations on the data within the correspondingcore 602. The AL circuitry 616 of some examples performs integer basedoperations. In other examples, the AL circuitry 616 also performsfloating point operations. In yet other examples, the AL circuitry 616may include first AL circuitry that performs integer based operationsand second AL circuitry that performs floating point operations. In someexamples, the AL circuitry 616 may be referred to as an Arithmetic LogicUnit (ALU). The registers 618 are semiconductor-based structures tostore data and/or instructions such as results of one or more of theoperations performed by the AL circuitry 616 of the corresponding core602. For example, the registers 618 may include vector register(s), SIMDregister(s), general purpose register(s), flag register(s), segmentregister(s), machine specific register(s), instruction pointerregister(s), control register(s), debug register(s), memory managementregister(s), machine check register(s), etc. The registers 618 may bearranged in a bank as shown in FIG. 5. Alternatively, the registers 618may be organized in any other arrangement, format, or structureincluding distributed throughout the core 602 to shorten access time.The second bus 622 may implement at least one of an I2C bus, a SPI bus,a PCI bus, or a PCIe bus

Each core 602 and/or, more generally, the microprocessor 600 may includeadditional and/or alternate structures to those shown and describedabove. For example, one or more clock circuits, one or more powersupplies, one or more power gates, one or more cache home agents (CHAs),one or more converged/common mesh stops (CMSs), one or more shifters(e.g., barrel shifter(s)) and/or other circuitry may be present. Themicroprocessor 600 is a semiconductor device fabricated to include manytransistors interconnected to implement the structures described abovein one or more integrated circuits (ICs) contained in one or morepackages. The processor circuitry may include and/or cooperate with oneor more accelerators. In some examples, accelerators are implemented bylogic circuitry to perform certain tasks more quickly and/or efficientlythan can be done by a general purpose processor. Examples ofaccelerators include ASICs and FPGAs such as those discussed herein. AGPU or other programmable device can also be an accelerator.Accelerators may be on-board the processor circuitry, in the same chippackage as the processor circuitry and/or in one or more separatepackages from the processor circuitry.

FIG. 7 is a block diagram of another example implementation of theprocessor circuitry 512 of FIG. 5. In this example, the processorcircuitry 512 is implemented by FPGA circuitry 700. The FPGA circuitry700 can be used, for example, to perform operations that could otherwisebe performed by the example microprocessor 600 of FIG. 6 executingcorresponding machine readable instructions. However, once configured,the FPGA circuitry 700 instantiates the machine readable instructions inhardware and, thus, can often execute the operations faster than theycould be performed by a general purpose microprocessor executing thecorresponding software.

More specifically, in contrast to the microprocessor 600 of FIG. 6described above (which is a general purpose device that may beprogrammed to execute some or all of the machine readable instructionsrepresented by the flowchart of FIG. 3 but whose interconnections andlogic circuitry are fixed once fabricated), the FPGA circuitry 700 ofthe example of FIG. 7 includes interconnections and logic circuitry thatmay be configured and/or interconnected in different ways afterfabrication to instantiate, for example, some or all of the machinereadable instructions represented by the flowchart of FIG. 3. Inparticular, the FPGA 700 may be thought of as an array of logic gates,interconnections, and switches. The switches can be programmed to changehow the logic gates are interconnected by the interconnections,effectively forming one or more dedicated logic circuits (unless anduntil the FPGA circuitry 700 is reprogrammed). The configured logiccircuits enable the logic gates to cooperate in different ways toperform different operations on data received by input circuitry. Thoseoperations may correspond to some or all of the software represented bythe flowchart of FIG. 3. As such, the FPGA circuitry 700 may bestructured to effectively instantiate some or all of the machinereadable instructions of the flowchart of FIG. 3 as dedicated logiccircuits to perform the operations corresponding to those softwareinstructions in a dedicated manner analogous to an ASIC. Therefore, theFPGA circuitry 700 may perform the operations corresponding to the someor all of the machine readable instructions of FIG. 3 faster than thegeneral purpose microprocessor can execute the same.

In the example of FIG. 6, the FPGA circuitry 700 is structured to beprogrammed (and/or reprogrammed one or more times) by an end user by ahardware description language (HDL) such as Verilog. The FPGA circuitry700 of FIG. 7, includes example input/output (I/O) circuitry 702 toobtain and/or output data to/from example configuration circuitry 704and/or external hardware (e.g., external hardware circuitry) 706. Forexample, the configuration circuitry 704 may implement interfacecircuitry that may obtain machine readable instructions to configure theFPGA circuitry 700, or portion(s) thereof. In some such examples, theconfiguration circuitry 704 may obtain the machine readable instructionsfrom a user, a machine (e.g., hardware circuitry (e.g., programmed ordedicated circuitry) that may implement an ArtificialIntelligence/Machine Learning (AI/ML) model to generate theinstructions), etc. In some examples, the external hardware 706 mayimplement the microprocessor 600 of FIG. 6. The FPGA circuitry 700 alsoincludes an array of example logic gate circuitry 708, a plurality ofexample configurable interconnections 710, and example storage circuitry712. The logic gate circuitry 708 and interconnections 710 areconfigurable to instantiate one or more operations that may correspondto at least some of the machine readable instructions of FIG. 3 and/orother desired operations. The logic gate circuitry 708 shown in FIG. 7is fabricated in groups or blocks. Each block includessemiconductor-based electrical structures that may be configured intologic circuits. In some examples, the electrical structures includelogic gates (e.g., And gates, Or gates, Nor gates, etc.) that providebasic building blocks for logic circuits. Electrically controllableswitches (e.g., transistors) are present within each of the logic gatecircuitry 708 to enable configuration of the electrical structuresand/or the logic gates to form circuits to perform desired operations.The logic gate circuitry 708 may include other electrical structuressuch as look-up tables (LUTs), registers (e.g., flip-flops or latches),multiplexers, etc.

The interconnections 710 of the illustrated example are conductivepathways, traces, vias, or the like that may include electricallycontrollable switches (e.g., transistors) whose state can be changed byprogramming (e.g., using an HDL instruction language) to activate ordeactivate one or more connections between one or more of the logic gatecircuitry 708 to program desired logic circuits.

The storage circuitry 712 of the illustrated example is structured tostore result(s) of the one or more of the operations performed bycorresponding logic gates. The storage circuitry 712 may be implementedby registers or the like. In the illustrated example, the storagecircuitry 712 is distributed amongst the logic gate circuitry 708 tofacilitate access and increase execution speed.

The example FPGA circuitry 700 of FIG. 7 also includes example DedicatedOperations Circuitry 714. In this example, the Dedicated OperationsCircuitry 714 includes special purpose circuitry 716 that may be invokedto implement commonly used functions to avoid the need to program thosefunctions in the field. Examples of such special purpose circuitry 716include memory (e.g., DRAM) controller circuitry, PCIe controllercircuitry, clock circuitry, transceiver circuitry, memory, andmultiplier-accumulator circuitry. Other types of special purposecircuitry may be present. In some examples, the FPGA circuitry 700 mayalso include example general purpose programmable circuitry 718 such asan example CPU 720 and/or an example DSP 722. Other general purposeprogrammable circuitry 718 may additionally or alternatively be presentsuch as a GPU, an XPU, etc., that can be programmed to perform otheroperations.

Although FIGS. 6 and 7 illustrate two example implementations of theprocessor circuitry 512 of FIG. 5, many other approaches arecontemplated. For example, as mentioned above, modern FPGA circuitry mayinclude an on-board CPU, such as one or more of the example CPU 720 ofFIG. 7. Therefore, the processor circuitry 512 of FIG. 5 mayadditionally be implemented by combining the example microprocessor 600of FIG. 6 and the example FPGA circuitry 700 of FIG. 7. In some suchhybrid examples, a first portion of the machine readable instructionsrepresented by the flowchart of FIG. 3 may be executed by one or more ofthe cores 502 of FIG. 5, a second portion of the machine readableinstructions represented by the flowchart of FIG. 3 may be executed bythe FPGA circuitry 700 of FIG. 7, and/or a third portion of the machinereadable instructions represented by the flowchart of FIG. 3 may beexecuted by an ASIC. It should be understood that some or all of thecircuitry of FIG. 1 may, thus, be instantiated at the same or differenttimes. Some or all of the circuitry may be instantiated, for example, inone or more threads executing concurrently and/or in series. Moreover,in some examples, some or all of the circuitry of FIG. 1 may beimplemented within one or more virtual machines and/or containersexecuting on the microprocessor.

In some examples, the processor circuitry 512 of FIG. 5 may be in one ormore packages. For example, the processor circuitry 600 of FIG. 6 and/orthe FPGA circuitry 700 of FIG. 7 may be in one or more packages. In someexamples, an XPU may be implemented by the processor circuitry 512 ofFIG. 5, which may be in one or more packages. For example, the XPU mayinclude a CPU in one package, a DSP in another package, a GPU in yetanother package, and an FPGA in still yet another package.

A block diagram illustrating an example software distribution platform805 to distribute software such as the example machine readableinstructions 532 of FIG. 5 to hardware devices owned and/or operated bythird parties is illustrated in FIG. 8. The example softwaredistribution platform 805 may be implemented by any computer server,data facility, cloud service, etc., capable of storing and transmittingsoftware to other computing devices. The third parties may be customersof the entity owning and/or operating the software distribution platform805. For example, the entity that owns and/or operates the softwaredistribution platform 805 may be a developer, a seller, and/or alicensor of software such as the example machine readable instructions532 of FIG. 5. The third parties may be consumers, users, retailers,OEMs, etc., who purchase and/or license the software for use and/orre-sale and/or sub-licensing. In the illustrated example, the softwaredistribution platform 805 includes one or more servers and one or morestorage devices. The storage devices store the machine readableinstructions 532, which may correspond to the example machine readableinstructions 300, 400 of FIGS. 3 and/or 4, as described above. The oneor more servers of the example software distribution platform 805 are incommunication with a network 810, which may correspond to any one ormore of the Internet and/or any of the example networks 526 describedabove. In some examples, the one or more servers are responsive torequests to transmit the software to a requesting party as part of acommercial transaction. Payment for the delivery, sale, and/or licenseof the software may be handled by the one or more servers of thesoftware distribution platform and/or by a third party payment entity.The servers enable purchasers and/or licensors to download the machinereadable instructions 532 from the software distribution platform 805.For example, the software, which may correspond to the example machinereadable instructions 532 of FIG. 5, may be downloaded to the exampleprocessor platform 500, which is to execute the machine readableinstructions 532 to implement the example knowledge builder circuitry105, the example model builder circuitry 115, and/or the example targethardware 120 of FIG. 1. In some examples, one or more servers of thesoftware distribution platform 805 periodically offer, transmit, and/orforce updates to the software (e.g., the example machine readableinstructions 532 of FIG. 5) to ensure improvements, patches, updates,etc., are distributed and applied to the software at the end userdevices.

From the foregoing, it will be appreciated that example systems,methods, apparatus, and articles of manufacture have been disclosed thatenable neural architecture search to be performed based on priorknowledge of models created to perform particular tasks. Disclosedsystems, methods, apparatus, and articles of manufacture improve theefficiency of using a computing device by avoiding re-discovery ofmodels that would otherwise be initially discovered by neuralarchitecture search, but that do not function well for the intendedtask. By starting from based on prior knowledge, higher performingmodels can be identified more quickly. This reduces resource consumptionnot only on the target hardware (e.g., more efficient models can bedeveloped), but also reduces resource consumption on systems thatgenerate models (e.g., higher performing models can be discovered morequickly/efficiently). Disclosed systems, methods, apparatus, andarticles of manufacture are accordingly directed to one or moreimprovement(s) in the operation of a machine such as a computer or otherelectronic and/or mechanical device.

Example methods, apparatus, systems, and articles of manufacture fordata enhanced automated model generation are disclosed herein. Furtherexamples and combinations thereof include the following:

Example 1 includes an apparatus for data enhanced automated modelgeneration, the apparatus comprising interface circuitry to access arequest to generate a machine learning model, and processor circuitryincluding one or more of at least one of a central processing unit, agraphic processing unit, or a digital signal processor, the at least oneof the central processing unit, the graphic processing unit, or thedigital signal processor having control circuitry to control datamovement within the processor circuitry, arithmetic and logic circuitryto perform one or more first operations corresponding to instructions,and one or more registers to store a result of the one or more firstoperations, the instructions in the apparatus, a Field Programmable GateArray (FPGA), the FPGA including logic gate circuitry, a plurality ofconfigurable interconnections, and storage circuitry, the logic gatecircuitry and interconnections to perform one or more second operations,the storage circuitry to store a result of the one or more secondoperations, or Application Specific Integrate Circuitry (ASIC) includinglogic gate circuitry to perform one or more third operations, theprocessor circuitry to perform at least one of the first operations, thesecond operations, or the third operations to instantiate task dataorchestration circuitry to generate task knowledge based on a previouslygenerated machine learning model, search space management circuitry tocreate a search space based on the task knowledge, and neuralarchitecture search circuitry to generate the machine learning modelusing neural architecture search, the neural architecture searchcircuitry to begin an architecture search based on the search space.

Example 2 includes the apparatus of example 1, wherein the processorcircuitry is to, during generation of the machine learning model, inserta plurality of anchor points into the machine learning model, the anchorpoints to be used for collection of a performance statistic concerningexecution of the machine learning model.

Example 3 includes the apparatus of example 2, wherein the performancestatistic includes at least one of power efficiency or energyefficiency.

Example 4 includes the apparatus of example 2, wherein the processorcircuitry is further to collect the performance statistic based on theanchor points.

Example 5 includes the apparatus of example 4, wherein, to generate thetask knowledge, the processor circuitry is further to rank features ofthe previously generated machine learning model.

Example 6 includes the apparatus of example 1, wherein to create thesearch space, the processor circuitry is to select a prior architecturebased on performance of the prior architecture on a selected hardware.

Example 7 includes At least one non-transitory computer readable storagemedium comprising instructions that, when executed, cause at least oneprocessor to at least access a request to generate a machine learningmodel to perform a selected task, generate task knowledge based on apreviously generated machine learning model, create a search space basedon the task knowledge, and generate a machine learning model usingneural architecture search, the neural architecture search beginningbased on the search space.

Example 8 includes the at least one non-transitory computer readablestorage medium of example 7, wherein the instructions, when executed,further cause the at least one processor to insert a plurality of anchorpoints into the machine learning model, the anchor points to be usedwhen collecting a performance statistic concerning execution of themachine learning model.

Example 9 includes the at least one non-transitory computer readablestorage medium of example 8, wherein the instructions, when executed,further cause the at least one processor to collect the performancestatistic based on the anchor points.

Example 10 includes the at least one non-transitory computer readablestorage medium of example 9, wherein the instructions, when executed,further cause the at least one processor to rank features of thepreviously generated machine learning model to generate the taskknowledge.

Example 11 includes the at least one non-transitory computer readablestorage medium of example 7, wherein the instructions, when executed,further cause the at least one processor select a prior architecturebased on performance of the prior architecture on a selected hardware tocreate the search space.

Example 12 includes a method for data enhanced automated modelgeneration, the method comprising accessing a request to generate amachine learning model to perform a selected task, generating taskknowledge based on a previously generated machine learning model,creating a search space based on the task knowledge, and generating amachine learning model using neural architecture search, the neuralarchitecture search beginning based on the search space.

Example 13 includes the method of example 12, further including, duringgeneration of the machine learning model, inserting a plurality ofanchor points into the machine learning model, the anchor points to beused when collecting a performance statistic concerning execution of themachine learning model.

Example 14 includes the method of example 13, further includingcollecting the performance statistic based on the anchor points.

Example 15 includes the method of example 14, wherein the generation ofthe task knowledge includes ranking features of the previously generatedmachine learning model.

Example 16 includes the method of example 12, wherein the creation ofthe search space includes selecting a prior architecture based onperformance of the prior architecture on a selected hardware.

Example 17 includes an apparatus for data enhanced automated modelgeneration, the apparatus comprising means for accessing a request togenerate a machine learning model to perform a selected task, means forgenerating task knowledge based on a previously generated machinelearning model, means for creating a search space based on the taskknowledge, and means for generating a machine learning model usingneural architecture search, the neural architecture search beginningbased on the search space.

Example 18 includes the apparatus of example 17, further means forinserting, during generation of the machine learning model, a pluralityof anchor points into the machine learning model, the anchor points tobe used when collecting a performance statistic concerning execution ofthe machine learning model.

Example 19 includes the apparatus of example 18, further including meansfor collecting the performance statistic based on the anchor points.

Example 20 includes the apparatus of example 19, wherein the means forgenerating is further to rank features of the previously generatedmachine learning model.

Example 21 includes the apparatus of example 17, wherein the means forcreating is to select a prior architecture based on performance of theprior architecture on a selected hardware.

The following claims are hereby incorporated into this DetailedDescription by this reference. Although certain example systems,methods, apparatus, and articles of manufacture have been disclosedherein, the scope of coverage of this patent is not limited thereto. Onthe contrary, this patent covers all systems, methods, apparatus, andarticles of manufacture fairly falling within the scope of the claims ofthis patent.

What is claimed is:
 1. An apparatus for data enhanced automated modelgeneration, the apparatus comprising: interface circuitry to access arequest to generate a machine learning model; and processor circuitryincluding one or more of: at least one of a central processing unit, agraphic processing unit, or a digital signal processor, the at least oneof the central processing unit, the graphic processing unit, or thedigital signal processor having control circuitry to control datamovement within the processor circuitry, arithmetic and logic circuitryto perform one or more first operations corresponding to instructions,and one or more registers to store a result of the one or more firstoperations, the instructions in the apparatus; a Field Programmable GateArray (FPGA), the FPGA including logic gate circuitry, a plurality ofconfigurable interconnections, and storage circuitry, the logic gatecircuitry and interconnections to perform one or more second operations,the storage circuitry to store a result of the one or more secondoperations; or Application Specific Integrate Circuitry (ASIC) includinglogic gate circuitry to perform one or more third operations; theprocessor circuitry to perform at least one of the first operations, thesecond operations, or the third operations to instantiate: task dataorchestration circuitry to generate task knowledge based on a previouslygenerated machine learning model; search space management circuitry tocreate a search space based on the task knowledge; and neuralarchitecture search circuitry to generate the machine learning modelusing neural architecture search, the neural architecture searchcircuitry to begin an architecture search based on the search space. 2.The apparatus of claim 1, wherein the processor circuitry is to, duringgeneration of the machine learning model, insert a plurality of anchorpoints into the machine learning model, the anchor points to be used forcollection of a performance statistic concerning execution of themachine learning model.
 3. The apparatus of claim 2, wherein theperformance statistic includes at least one of power efficiency orenergy efficiency.
 4. The apparatus of claim 2, wherein the processorcircuitry is further to collect the performance statistic based on theanchor points.
 5. The apparatus of claim 4, wherein, to generate thetask knowledge, the processor circuitry is further to rank features ofthe previously generated machine learning model.
 6. The apparatus ofclaim 1, wherein to create the search space, the processor circuitry isto select a prior architecture based on performance of the priorarchitecture on a selected hardware.
 7. At least one non-transitorycomputer readable storage medium comprising instructions that, whenexecuted, cause at least one processor to at least: access a request togenerate a machine learning model to perform a selected task; generatetask knowledge based on a previously generated machine learning model;create a search space based on the task knowledge; and generate amachine learning model using neural architecture search, the neuralarchitecture search beginning based on the search space.
 8. The at leastone non-transitory computer readable storage medium of claim 7, whereinthe instructions, when executed, further cause the at least oneprocessor to insert a plurality of anchor points into the machinelearning model, the anchor points to be used when collecting aperformance statistic concerning execution of the machine learningmodel.
 9. The at least one non-transitory computer readable storagemedium of claim 8, wherein the instructions, when executed, furthercause the at least one processor to collect the performance statisticbased on the anchor points.
 10. The at least one non-transitory computerreadable storage medium of claim 9, wherein the instructions, whenexecuted, further cause the at least one processor to rank features ofthe previously generated machine learning model to generate the taskknowledge.
 11. The at least one non-transitory computer readable storagemedium of claim 7, wherein the instructions, when executed, furthercause the at least one processor select a prior architecture based onperformance of the prior architecture on a selected hardware to createthe search space.
 12. A method for data enhanced automated modelgeneration, the method comprising: accessing a request to generate amachine learning model to perform a selected task; generating taskknowledge based on a previously generated machine learning model;creating a search space based on the task knowledge; and generating amachine learning model using neural architecture search, the neuralarchitecture search beginning based on the search space.
 13. The methodof claim 12, further including, during generation of the machinelearning model, inserting a plurality of anchor points into the machinelearning model, the anchor points to be used when collecting aperformance statistic concerning execution of the machine learningmodel.
 14. The method of claim 13, further including collecting theperformance statistic based on the anchor points.
 15. The method ofclaim 14, wherein the generation of the task knowledge includes rankingfeatures of the previously generated machine learning model.
 16. Themethod of claim 12, wherein the creation of the search space includesselecting a prior architecture based on performance of the priorarchitecture on a selected hardware.
 17. An apparatus for data enhancedautomated model generation, the apparatus comprising: means foraccessing a request to generate a machine learning model to perform aselected task; means for generating task knowledge based on a previouslygenerated machine learning model; means for creating a search spacebased on the task knowledge; and means for generating a machine learningmodel using neural architecture search, the neural architecture searchbeginning based on the search space.
 18. The apparatus of claim 17,further means for inserting, during generation of the machine learningmodel, a plurality of anchor points into the machine learning model, theanchor points to be used when collecting a performance statisticconcerning execution of the machine learning model.
 19. The apparatus ofclaim 18, further including means for collecting the performancestatistic based on the anchor points.
 20. The apparatus of claim 19,wherein the means for generating is further to rank features of thepreviously generated machine learning model.
 21. The apparatus of claim17, wherein the means for creating is to select a prior architecturebased on performance of the prior architecture on a selected hardware.